36 research outputs found

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    Complexity scalable bitplane image coding with parallel coefficient processing

    Get PDF
    Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

    Optimització d'una aplicacio bioinformàtica d'alineament de seqüències executada en processadors multi-core i many-core (GPUs)

    Get PDF
    Las aplicaciones de alineamiento de secuencias son una herramienta importante para la comunidad científica. Estas aplicaciones bioinformáticas son usadas en muchos campos distintos como pueden ser la medicina, la biología, la farmacología, la genética, etc. A día de hoy los algoritmos de alineamiento de secuencias tienen una complejidad elevada y cada día tienen que manejar un volumen de datos más grande. Por esta razón se deben buscar alternativas para que estas aplicaciones sean capaces de manejar el aumento de tamaño que los bancos de secuencias están sufriendo día a día. En este proyecto se estudian y se investigan mejoras en este tipo de aplicaciones como puede ser el uso de sistemas paralelos que pueden mejorar el rendimiento notablemente.Les aplicacions d'alineament de seqüències són una eina important per a la comunitat científica. Aquestes aplicacions bioinformàtiques són utilitzades en molts camps diferents com poden ser la medicina, la biologia, la farmacologia, la genètica, etc. A dia d'avui els algorismes d'alineament de seqüències tenen una complexitat elevada i cada dia han de gestionar un volum de dades més gran. Per això s'han de buscar alternatives per a que aquestes aplicacions siguin capaces de gestionar l'augment de mida que els bancs de seqüències estan patint dia a dia. En aquest projecte s'estudien i s'investiguen millores en aquest tipus d'aplicacions com pot ser l'ús de sistemes paral·leles que poden millorar el rendiment notablement.The sequence alignment applications are an important tool for the scientific community. These bioinformatics applications are used in many different fields such as medicine, biology, pharmacology, genetics, etc. Today the sequence alignment algorithms are highly complex and every day have to handle a large volume of data. For this reason we must find alternatives for these applications are able to handle the increased size of sequences that banks are suffering every day. In this project we study and investigate improvements in these applications such as the use of parallel systems that can improve performance significantly

    Alineamiento de secuencias genéticas en procesadores multicore

    Get PDF
    Este trabajo analiza el rendimiento del algoritmo de alineamiento de secuencias conocido como Needleman-Wunsch, sobre 3 sistemas de cómputo multiprocesador diferentes. Se analiza y se codifica el algoritmo serie usando el lenguaje de programación C y se plantean una serie de optimizaciones con la finalidad de minimizar el volumen y el tiempo de cómputo. Posteriormente, se realiza un análisis de las prestaciones del programa sobre los diferentes sistemas de cómputo. En la segunda parte del trabajo, se paraleliza el algoritmo serie y se codifica ayudándonos de OpenMP. El resultado son dos variantes del programa que difieren en la relación entre la cantidad de cómputo y la de comunicación. En la primera variante, la comunicación entre procesadores es poco frecuente y se realiza tras largos periodos de ejecución (granularidad gruesa). En cambio, en la segunda variante las tareas individuales son relativamente pequeñas en término de tiempo de ejecución y la comunicación entre los procesadores es frecuente (granularidad fina). Ambas variantes se ejecutan y analizan en arquitecturas multicore que explotan el paralelismo a nivel de thread. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multicore y multithreading en el rendimiento.Aquest treball analitza el rendiment de l'algorisme d'alineament de seqüències conegut com a Needleman-Wunsch sobre 3 sistemes de còmput multiprocessador diferents. S'analitza i es codifica l'algorisme sèrie emprant el llenguatge de programació C i es plantegen una sèrie d'optimitzacions amb la finalitat de minimitzar el volum i el temps de còmput. Posteriorment es realitza una anàlisi de les prestacions del programa sobre els diferents sistemes de còmput. En la segona part del treball, es paral·lelitza l'algorisme sèrie i es codifica ajudant-nos de OpenMP. El resultat són dues variants del programa que difereixen en la relació entre la quantitat de còmput i la de comunicació. En la primera variant, la comunicació entre processadors és poc habitual i es realitza després de llargs períodes d'execució (granularitat gruixuda). En canvi, en la segona variant les tasques individuals s'executen relativament ràpides i la comunicació entre els processadors és freqüent (granularitat fina). Ambdues variants s'executen i s'analitzen en arquitectures multicore que exploten el paral·lelisme a nivell de thread. Els resultats obtinguts ens mostren la importància d'entendre i saber analitzar l'efecte del multicore i el multithreading en el rendiment.This research analyzes the performance of three multiprocessor computing nodes solving the seqüence alignment algorithm known as Needleman-Wunsh. First of all, the algorithm is analyzed and coded using the C language. We raise a series of optimizations with a common goal: minimize memory requirements and reduce computation time. Right afterwards we analyze the program's performance over the three computation nodes. In the second part of the research the sequential algorithm is parallelized using OpenMP. Two program variations are designed, these two variations differs between them in the amount of computation and the comunication. On the first variation the comunication between processors is rarely common and only occurs after long time periods . On the second variation the tasks are processed rapidly and the communication between processors is common. Both variations have been implemented and executed in multicore architectures that exploits thread-level parallelism. The result shows the importance of understanding and knowing how to analyze the effect of multicore and multithreading performance

    Approaching long genomic regions and large recombination rates with msParSm as an alternative to MaCS

    Get PDF
    The msParSm application is an evolution of msPar, the parallel version of the coalescent simulation program ms, which removes the limitation for simulating long stretches of DNA sequences with large recombination rates, without compromising the accuracy of the standard coalescence. This work introduces msParSm, describes its significant performance improvements over msPar and its shared memory parallelization details, and shows how it can get better, if not similar, execution times than MaCS. Two case studies with different mutation rates were analyzed, one approximating the human average and the other approximating the Drosophila melanogaster average. Source code is available at https://github.com/cmontemuino/msparsm

    Optimització d'una aplicació bioinformàtica d'aliniament de seqüències executada en processadors many-core (GPUs)

    Get PDF
    Las herramientas de análisis de secuencias genómicas permiten a los biólogos identificar y entender regiones fundamentales que tienen implicación en enfermedades genéticas. Actualmente existe una necesidad de dotar al ámbito científico de herramientas de análisis eficientes. Este proyecto lleva a cabo una caracterización y análisis del rendimiento de algoritmos utilizados en la comparación de secuencias genómicas completas, y ejecutadas en arquitecturas MultiCore y ManyCore. A partir del análisis se evalúa la idoneidad de este tipo de arquitecturas para resolver el problema de comparar secuencias genómicas. Finalmente se propone una serie de modificaciones en las implementaciones de estos algoritmos con el objetivo de mejorar el rendimiento.Les eines d'anàlisi de seqüències genòmiques permeten als biòlegs identificar i entendre regions fonamentals que tenen implicació en malalties genètiques. Actualment hi ha una necessitat d'aportar a l'àmbit científic eines d'anàlisi eficients. Aquest projecte desenvolupa una caracterització i anàlisi del rendiment d'algoritmes utilitzats en la comparació de seqüències genòmiques completes executades en arquitectures MultiCore i ManyCore. A partir de l'anàlisi s'evalua la idoneïtat d'aquest tipus d'arquitectures per resoldre el problema de la comparació de seqüències genòmiques. Finalment es proposen una sèrie de modificacions en les implementacions d'aquests algoritmes amb l'objectiu de millorar el rendiment.The analysis tools of the genomic sequence allow biologists to identify and understand the basic regions that are involved in genetic diseases. Nowadays there is the necessity to give the science efficiency analyse tools. This project makes a characterisation and analysis of the output in the algorithms used on the complete sequence comparison, performed on MultiCore and ManyCore architectures. From this analysis the suitability of this kind of architectures on the solution of the comparison gene sequence is evaluated. Finally a series of modifications for the implementations of these algorithms are proposed, to allow the output improvement

    GPU implementation of bitplane coding with parallel coefficient processing for high performance image compression

    Get PDF
    The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 x less energy for equivalent performance than state-of-the-art methods

    Implementation of the DWT in a GPU through a register-based strategy

    Get PDF
    The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger register memory space and instructions for the communication of registers among threads. This facilitates a new programming strategy that utilizes registers for data sharing and reusing in detriment of the shared memory. Such a programming strategy can significantly improve the performance of applications that reuse data heavily. This paper presents a register-based implementation of the Discrete Wavelet Transform (DWT), the prevailing data decorrelation technique in the field of image coding. Experimental results indicate that the proposed method is, at least, four times faster than the best GPU implementation of the DWT found in the literature. Furthermore, theoretical analysis coincide with experimental tests in proving that the execution times achieved by the proposed implementation are close to the GPU's performance limits

    Copper distribution and acid‐base mobilization in vineyard soils and sediments from Galicia (NW Spain)

    Get PDF
    In northern Spain and elsewhere in the world, many vineyards are located on steep slopes and are susceptible to accelerated soil erosion. Contaminants, notably Cu, originating from repeated application of copper‐based fungicides to the vines to prevent mildew, are transported and stored in the sediments deposited close to valley bottoms. In this study, the contents and distribution of Cu in 17 soil samples and 21 sediment samples collected from vineyard stands were determined. In addition, the effect of pH on Cu release from vineyard soils and sediments was quantified. The total Cu content (Cu T) in the soils varied between 96 and 583 mg kg−1, and was between 1.2 and 5.6 times greater in sediment samples. The mean concentration of potentially bioavailable Cu (Cu EDTA) in the sediments was 199 mg kg−1 (46% of Cu T), and was 80 mg kg −1 (36% of Cu T) in the soils. Copper bound to soil organic matter (Cu OM) was the dominant fraction in the soils (on average, 53% of the Cu T), while in sediment samples Cu OM values varied between 37 and 712 mg kg−1 and were significantly greater (P < 0.01) than in the soils. Copper associated with non‐crystalline inorganic components (Cu IA) was the second most important fraction in the sediments, in which it was 3.4 times greater than in the soils. Release of Cu due to changes in the pH followed a U‐shaped pattern in soils and sediments. The release of Cu increased when the pH decreased below 5.5 due to the increased solubility of the metal at this pH. When the pH increased above 7.5, Cu and organic matter were released simultaneously.Ministerio de Educación y Ciencia | Ref. AGL2006-04231/AG
    corecore